Derivative pricing using neural networks#
Over the past few years, advancements in artificial intelligence and machine learning have naturally led to the adoption of these techniques in many other fields. Neural networks, for example, have become a highly flexible tool capable of solving a wide variety of problems such as image classification, speech recognition, and language translation.
Today, we also see their application in various industries, including finance. In this notebook, we demonstrate how these models can be used—for instance, to price derivatives. Although derivative pricing is a relatively simple and well-studied problem, it provides an interesting example to illustrate how neural networks can solve financial problems, not only in pricing but also in inverse problems (e.g., model calibration) and risk management.
In this entry, we focus on a particular technique: Physics-Informed Neural Networks (PINNs). As the name suggests, these neural networks are designed to incorporate additional information derived from physical laws or known mathematical relationships. Typically, this is achieved by embedding these laws into the loss function to guide the network toward a solution that satisfies the given differential equation.
What are Physics-Informed Neural Networks (PINNs)?#
First we begin with some definitions. Suppose we wish to solve a partial differential equation (PDE) of the form \( \mathcal{N}[V(\mathbf{x})] = 0,\quad \mathbf{x} \in \Omega, \) where:
\(\mathcal{N}\) is a differential operator,
\(V(\mathbf{x})\) is the unknown solution,
\(\Omega\) is the spatial (and possibly temporal) domain.
In a PINN, we approximate \(V(\mathbf{x})\) with a neural network \(V_\theta(\mathbf{x})\) parameterized by \(\theta\). The loss function is constructed to penalize deviations from both the governing PDE in the domain and the prescribed boundary conditions on \(\partial \Omega\). Formally, the loss can be written as:
with
and
where:
\(\{\mathbf{x}_r^i\}_{i=1}^{N_r}\) are the collocation points in the interior of the domain \(\Omega\),
\(\{\mathbf{x}_b^j\}_{j=1}^{N_b}\) are the points on the boundary \(\partial \Omega\) with known values \(g(\mathbf{x}_b^j)\).
In order to train the network, we minimize the loss function using gradient-based optimization techniques. The resulting neural network will approximate the solution to the PDE in the domain \(\Omega\) and satisfy the boundary conditions on \(\partial \Omega\).
The idea is not to overcomplicate things, so we start with the example. First, we import the necessary libraries:
import torch
import torch.nn as nn
import torch.optim as optim
import numpy as np
from torch.autograd import grad
from scipy.stats import norm
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
np.random.seed(42)
torch.manual_seed(42)
Loss functions and the governing equation#
For the problem of pricing options, the fundamental PDE we need to solve is the Black-Scholes equation. This equation is a parabolic PDE that describes the evolution of the price \(V(S,t)\) of an option in terms of time \(t\) and the underlying asset price \(S\). It is given by:
where:
\(V(S,t)\) is the option price,
\(S\) is the price of the underlying asset,
\(r\) is the risk-free interest rate,
\(\sigma\) is the volatility of the underlying asset.
Our objective is to use this PDE to guide the training of our neural network. Modern machine learning libraries, such as PyTorch, offer automatic differentiation capabilities. For example, torch.autograd.grad efficiently computes the necessary derivatives, such as \(\nabla V(S,t)\), at each training point. In our fomulation, \(\mathcal{L}_{\mathrm{PDE}}(\theta)\) would be represented by the following function:
def pde_loss_f(model, inputs, sigma, r):
inputs.requires_grad_(True)
V = model(inputs)
# First order
gradients = grad(V, inputs, grad_outputs=torch.ones_like(
V), create_graph=True)[0]
dVdt = gradients[:, 0]
dVdS = gradients[:, 1]
# Second order
d2VdS2 = grad(dVdS, inputs, grad_outputs=torch.ones_like(
dVdS), create_graph=True)[0][:, 1]
S = inputs[:, 1]
V = V[:, 0]
# PDE
pde_residual = dVdt + 0.5 * sigma ** 2 * S ** 2 * d2VdS2 + r * S * dVdS - r * V
loss_pde = torch.mean(pde_residual ** 2)
return loss_pde
Additionally, to incorporate the boundary conditions, we need to define an additional loss functions that force the network to satisfy these constraints. Since the boundary conditions we are using are of Dirichlet type, the loss function is relatively simple (it merely compares the network outputs against the ground truth). However, nothing prevents us from employing other types of boundary conditions (such as Neumann or Robin).
def boundary_loss_f(model, inputs, outputs):
V_pred = model(inputs)
boundary_loss = torch.mean((V_pred - outputs) ** 2)
return boundary_loss
Collocation points#
To solve the PDE using PINNs, we need to define the numerical domain over which the equation will be solved. Much like traditional numerical methods (e.g., finite differences), we select a set of points -called collocation points- where the PDE is enforced. These include points in the interior of the domain \(\Omega\) as well as on its boundary \(\partial\Omega\).
To enforce the boundary conditions, additional loss terms are added that force the network to satisfy them. For a European call option, the typical boundary conditions are:
At \(S = 0\): $\( V(0, t) = 0, \)$ since if the underlying asset’s price is zero, the option is worthless.
For \(S \to \infty\): $\( V(S, t) \approx S - K e^{-r(T-t)}, \)\( as the option behaves like a linear payoff for large \)S$.
At expiration \(t = T\): $\( V(S, T) = \max(S - K,\, 0), \)$ which is simply the payoff of the option.
We sample this points using random numbers, and we will use them to train the neural network. Next is an implementation for sampling this points:
def payoff(s, k):
return np.maximum(s-k, 0)
def interior_samples(option_config, scaling=4):
t = np.random.uniform(
0, option_config['maturity'], option_config['n_samples'])
s = np.random.uniform(
0, scaling*option_config['strike'], option_config['n_samples'])
tag = np.zeros(option_config['n_samples'])
output = np.zeros(option_config['n_samples'])
return t, s, tag, output
def top_boundary_samples(option_config, scaling=4):
t = np.random.uniform(
0, option_config['maturity'], option_config['n_samples'])
s = np.ones(option_config['n_samples']) * scaling*option_config['strike']
strike = option_config['strike']
s_max = scaling*option_config['strike']
r = option_config['r']
T = option_config['maturity']
output = s_max - strike * np.exp(-r*(T-t))
tag = np.ones(option_config['n_samples'])
return t, s, tag, output
def bottom_boundary_samples(option_config):
t = np.random.uniform(
0, option_config['maturity'], option_config['n_samples'])
s = np.zeros(option_config['n_samples'])
output = np.zeros(option_config['n_samples'])
tag = np.ones(option_config['n_samples'])
return t, s, tag, output
def initial_condition_samples(option_config, scaling=4):
t = np.ones(option_config['n_samples']) * option_config['maturity']
s = np.random.uniform(
0, scaling*option_config['strike'], option_config['n_samples'])
tag = np.ones(option_config['n_samples'])
output = payoff(s, option_config['strike'])
return t, s, tag, output
Important
The choice of collocation points is critical for the convergence of the method. One potential improvement is to use quasi-random numbers to distribute the collocation points more uniformly across the domain, thereby avoiding clustering that can occur with some random number generators.
Of course, we need to define the financial parameters that will be used in the problem. For this example, we will use the following values:
config = {'strike': 15, 'maturity': 1,
'r': 0.04, 'sigma': 0.25, 'n_samples': 10000}
# Generate samples
inner_samples = interior_samples(config)
top_samples = top_boundary_samples(config)
bottom_samples = bottom_boundary_samples(config)
initial_samples = initial_condition_samples(config)
Visually, this is how our domain looks like: